Statistics - Independent Samples t-Test
Table of Contents
This article explains the Independent Samples t-Test used in statistics.
We will also proceed with the Independent Samples t-Test using the Python Scipy library.
Independent Samples t-Test #
The Independent Samples t-Test is a statistical analysis technique used to test whether there is a statistically significant difference between the means of two independent samples. It is used to determine whether the difference in means between two groups occurred by chance or if it truly exists.
The main steps of the Independent Samples t-Test are as follows:
1. Hypothesis Setting #
H₀ : 𝜇₁ = 𝜇₂ → Null Hypothesis | The means of the two groups are the same. |
---|---|
H₁ : 𝜇₁ ≠ 𝜇₂ → Alternative Hypothesis | The means of the two groups are different. |
2. Normality Test #
If the sample size of the two groups is less than 30, a normality test must be conducted.
If the sample size of the two groups is more than 30, it is assumed that normality is satisfied due to the Central Limit Theorem.
- In Scipy, normality testing can be confirmed through the Shapiro-Wilk test.
3. Equality of Variances Test #
If the data counts of the two groups are the same, it is assumed that the variances are equal.
If the data counts of the two groups are different, an equality of variances test can be performed to check if the variances are equal.
- In Scipy, equality of variances testing can be confirmed through the Levene test.
4. Calculation of Independent Samples t-Statistic #
The independent samples t-statistic is calculated using the means and standard deviations of the two groups.
5. Decision/Conclusion #
If the calculated t-statistic exceeds the critical value, the null hypothesis is rejected and the alternative hypothesis is accepted.
Otherwise, the null hypothesis is not rejected.
If there is a statistically significant difference, it is concluded that there is a difference in means between the two groups.
The Independent Samples t-Test is useful for comparing the mean difference between two groups and can be applied in situations such as investigating differences between experimental and control groups or checking the effect between two conditions.
Using Python Library Scipy #
Below is how to proceed with the Independent Samples t-Test using the Python Scipy library.
The data we are dealing with includes results from concentration tests received by class B, thinking that if strength training indeed has an effect on improving concentration, there might not be a difference in the average concentration test scores between his class with many humanities students, class A, and class B with many students who regularly do strength training.
We want to see if there is a significant difference in concentration between classes A and B through an Independent Samples t-Test.
The hypothesis is as follows:
Null Hypothesis : The means of classes A and B are the same.
Alternative Hypothesis : The means of classes A and B are not the same.
The significance level is set at 0.05.
Let’s first load the data.
>>> import pandas as pd
>>> from scipy import stats
>>> df = pd.read_csv("./data/ch11_training_ind.csv")
>>> df.head()
A | B | |
---|---|---|
0 | 47 | 49 |
1 | 50 | 52 |
2 | 37 | 54 |
3 | 60 | 48 |
4 | 39 | 51 |
Next, let’s conduct a normality test.
>>> a = stats.shapiro(df['A'])
>>> b = stats.shapiro(df['B'])
>>> print(a, b)
ShapiroResult(statistic=0.9685943722724915, pvalue=0.7249553203582764)
ShapiroResult(statistic=0.9730021357536316, pvalue=0.8165789842605591)
Both results have a p-value greater than 0.05, satisfying normality.
Next, since the data counts are the same for these data, we assume equality of variances, but if the groups’ data counts differ, equality of variances needs to be tested, which can be confirmed through the Levene test as follows:
>>> stats.levene(df['A'], df['B'])
LeveneResult(statistic=2.061573118077718, pvalue=0.15923550057222613)
With a p-value of 0.159, the null hypothesis (the two groups’ variances are not different) is accepted.
Next, the t-statistic and p-value can be calculated using ttest_ind in the Scipy library.
>>> t, p = stats.ttest_ind(df['A'], df['B'], equal_var=False) # equal_var=False: Welch's method
>>> t, p
(-1.760815724652471, 0.08695731107259362)
Since the p-value is greater than the significance level of 0.05, the null hypothesis (the means of classes A and B are the same) is accepted. Therefore, it can be concluded that there is no significant difference in average scores between class A and class B.